Lessons Learned with Arc, an OAI-PMH Service Provider
نویسندگان
چکیده
Web-based digital libraries have historically been built in isolation utilizing different technologies, protocols, and metadata. These differences hindered the development of digital library services that enable users to discover information from multiple libraries through a single unified interface. The Open Archives Initiative Protocol for Metadata Harvesting (OAIPMH) is a major, international effort to address technical interoperability among distributed repositories. Arc debuted in 2000 as the first end-user OAI-PMH service provider. Since that time, Arc has grown to include nearly 7,000,000 metadata records. Arc has been deployed in a number of environments and has served as the basis for many other OAI-PMH projects, including Archon, Kepler, NCSTRL, and DP9. In this article we review the history of OAI-PMH and Arc, as well as some of the lessons learned while developing Arc and related OAI-PMH services. Interoperability is one of the significant research problems in the field of digital libraries (DLs) (Lynch & Garcia-Molina, 1995). The inability to federate, filter, and provide value-added services on remote content limits DLs to covering only local holdings. The Open Archive Initiative (OAI) is a major, international effort to address technical interoperability and facilitate discovery of content among distributed repositories. OAI differs from other interoperability approaches, such as Z39.50 (Lynch, 1997) or SDLIP (Paepcke et al., 2000), through its emphasis on a limited, simple, and easy to implement protocol that layers over an existing repository. The Xiaoming Liu, Los Alamos National Laboratory, Research Library, Los Alamos, NM 87545, and Kurt Maly, Michael L. Nelson, and Mohammad Zubair, Old Dominion University, Department of Computer Science, Norfolk, VA 23529 591 liu et al./lessons learned with arc OAI framework defines two functional roles: data providers (also “repositories”) and service providers (also “harvesters”). Service providers develop value-added services that are based on the metadata collected from data providers. These value-added services could take the form of cross-archive search engines, linking systems, and peer-review systems. The roots of the OAI lie in a vision to stimulate the growth of open e-print repositories. This concept began to be developed with the Universal Preprint Service (UPS) prototype (Van de Sompel et al., 2000), and was further advanced with the Santa Fe Convention (Van de Sompel & Lagoze, 2000). The UPS prototype was the discussion piece during an invitation-only workshop in Santa Fe, New Mexico, in the fall of 1999. This workshop brought together many of the leaders in the e-print community for the purpose of fostering interoperability between the various author-contributed e-print servers and institutional repositories in use at the time. Contemporary approaches toward interoperability were ad hoc at best. One of the distinguishing factors for the Santa Fe Workshop was the collective experience in building DLs and the associated interoperability problems; earlier interoperability workshops (Scherlis, 1996) were comparatively premature. The immediate result of this workshop was the Santa Fe Convention, an intermediate step toward the metadata harvesting model that would become the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Realizing that the simple metadata harvesting idea had appeal to a broader reach of communities than that engaged in e-print publishing, version 1.0 of the OAI-PMH was released in January 2001. Following an extended period of evaluation and alpha and beta testing, version 2.0 of the OAI-PMH was released as a stable specification in June 2002 (Lagoze, Van de Sompel, Nelson, & Warner, 2002a). The development, history, impact, and secondary effects of OAI-PMH have been discussed in several publications, including Lynch (2001), Nelson (2001), Lagoze and Van de Sompel (2001), Van de Sompel and Lagoze (2002) and Lagoze and Van de Sompel (2003). Arc Arc (http://arc.cs.odu.edu) is the first end-user federated search service based on the OAI-PMH (Liu, Maly, Zubair, & Nelson, 2001). The Repository Explorer (Suleman, 2001) was released prior to Arc, but its targeted audience is mainly repository developers and maintainers, not end-users. Arc was initially released as an experimental service to investigate issues in metadata harvesting in October 2000. The software developed for the Arc service (http://oaiarc.sourceforge.net/) was released as an open source system under NCSA-style license in September 2002. It has been used in several production and research projects (see Table 1). Arc was first developed as a proof-of-concept service for OAI-PMH; 592 library trends/spring 2005 however, the development of Arc revealed interesting problems and inspired further research in these domains. In this article we introduce the development and architecture of the Arc system and follow-up research that attempted to improve or optimize the metadata harvesting system and search performance. We will discuss the Archon project for building valueadded services to take advantage of rich metadata beyond Dublin Core (DC) (Weibel & Lagoze, 1997); the DP9 service to allow general search engines (Google, Yahoo, etc.) to index OAI-PMH compliant collections; and the recently funded Andrew Mellon Foundation DL Grid project for building a high-performance federated search service. When possible, interesting and general features resulting from these research projects are incorporated back into the publicly available Arc source code distribution. Development of Arc Arc was initially released as an experimental service to investigate issues in metadata harvesting. It immediately attracted interest because it was the only vehicle to demonstrate the potential and promise of OAI-PMH at that time. As new data providers appeared, they often requested to be added to the Arc system for demonstration purposes; by continuously integrating various new data providers, the software was made stable and fault tolerant. Originally conceived as more of a tour de force, Arc has become a useful tool for helping new data providers to make their collections truly OAIPMH-compliant by giving them feedback on errors during harvesting. When applying the Arc software in various environments, we encountered a number of problems such as inconsistent metadata, lack of controlled vocabulary, and XML errors. Based on feedback from other adopters, we have been able to address these problems and have consequently added many new features for customization and installation. The architecture of the Arc system has been refined to easily add or extend new functionalities. Arc is available for download (http://sourceforge.net/projects/ Table 1. Other Projects Using the Arc Software Digital Library URL Description MetaArchive www.metaarchive.org Sharing resources related to politics and religion (Halbert, 2003) NCSTRL www.ncstrl.org A collection of computer science technical reports and e-prints (Anan, Liu, & Maly
منابع مشابه
Arc - An OAI Service Provider for Digital Library Federation
The usefulness of the many on-line journals and scientific digital libraries that exist today is limited by the inability to federate these resources through a unified interface. The Open Archive Initiative (OAI) is one major effort to address technical interoperability among distributed archives. The objective of OAI is to develop a framework to facilitate the discovery of content in distribut...
متن کاملMetadata Harvesting with R and OAI-PMH
The Open Archives Initiative (http://www.openarchives.org/) develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. One key project is the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH, http: //www.openarchives.org/pmh/) which provides “a low-barrier mechanism for repository interoperability” for archives (institutiona...
متن کاملInitial Experiences Re-Exporting Duplicate and Similarity Computation with an OAI-PMH aggregator
The proliferation of the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) has resulted in the creation of a large number of service providers, all harvesting from either data providers or aggregators. If data were available regarding the similarity of metadata records, service providers could track redundant records across harvests from multiple sources as well as provide addi...
متن کاملSimilarity Computations with an OAI-PMH Aggregator
The proliferation of the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) has resulted in the creation of a large number of service providers, all harvesting from either data providers or aggregators. If data were available regarding the similarity of metadata records, service providers could track redundant records across harvests from multiple sources as well as provide addi...
متن کاملPutting a National Portal for Undergraduate Theses into Production
This paper discusses processes and experiences gained from creating a national portal (Uppsök) for Swedish undergraduate theses, using a common metadata model and set structure with agreements on semantics on top of OAI-PMH and harvesting from several data providers into a central service provider at the Swedish Royal Library.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Library Trends
دوره 53 شماره
صفحات -
تاریخ انتشار 2005